Spoken Document Retrieval Using Neighboring Documents and Extended Language Models for Query Likelihood Model

نویسندگان

  • Kazuaki Ogawa
  • Tatsuaki Murahashi
  • Hiroaki Taguchi
  • Koudai Nakajima
  • Masanori Takehara
  • Satoshi Tamura
  • Satoru Hayamizu
چکیده

This paper proposes several approaches for NTCIR-12 SpokenQuery & Doc-2[1]. Our methods are based on the query likelihood model which is one of the probabilisrtic language models choosing Dirichlet smoothing. We try to improve the performance by using extended language models. First, this paper develops and uses the language model obtained from related research papers. Second, this paper proposes a smoothing method employing the cache model and the N -gram model based on Kneser-Ney smoothing. Finally, this paper proposes a smoothing method using neighboring documents. Experiments were conducted to evaluate these methods using NTCIR-12 test sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Document Retrieval Using Extended Query Model and Web Documents

This paper proposes a novel approach for spoken document retrieval. In our method, a query model which is one of the probabilistic language models is adopted, in order to computes a probability to generate a given query from each document. We employ not only a “static” document collection consisting of targeted documents but also a “dynamic” document collection including web documents related w...

متن کامل

Effects of Query Expansion for Spoken Document Passage Retrieval

One of the major challenges for spoken document retrieval is how to handle speech recognition errors within the target documents. Query expansion is promising for this challenge. In this paper, we apply relevance models, a type of query expansion method, for the spoken document passage retrieval task. We adapted the original relevance model for passage retrieval. We also extended it to benefit ...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Robust retrieval models for false positive errors in spoken documents

How to deal with speech recognition errors and out-ofvocabulary (OOV) words, which are referred to as false negative errors, are common challenges in spoken document processing. To deal with them in spoken content retrieval (SCR), the SCR method that incorporated spoken term detection (STD) as the pre-process stage (referred to as STD-SCR) has been proposed. However, the STD-SCR tends to increa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016